Ordinary Kriging (OK)#

  • Ordinary Kriging (OK) is a commonly used geostatistical method.

  • OK provides the best linear unbiased estimates (BLUE), where the estimated value for the point of interest is a weighted linear combination of sampled observations (i.e., the sum of weights is 1) Matheron1963.

  • OK is similar to but more advanced than Inverse distance weighting, as the weight 𝜆𝑖 of OK is estimated by minimizing the variance of the prediction errors.

    • This is achieved by constructing a semivariogram that models the difference between neighboring values.

  • Compared to non-geostatistical algorithms, the strength of ordinary kriging is its ability to model the spatial structure (variance) of the sampled observations.

  • An assumption of ordinary kriging is data stationarity. That is, the mean of the interpolated variable is constant within the search window, which is often not true. This makes OK unsuitable for interpolation over large domains and often requires data transformation.

{note} Thank you to Xinli Cai for this great description of OK if her [master thesis](https://era.library.ualberta.ca/items/92cdc6ae-43fd-453f-91f2-5ff275cf85cd/view/164484ed-e950-408c-8be7-39d3764bdc15/Cai_Xinli_201704_MSc.pdf)

[1]:
import context
import salem
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
from pykrige.ok import OrdinaryKriging

import plotly.express as px
from datetime import datetime

from utils.utils import pixel2poly, plotvariogram
from context import data_dir
******************************
context imported. Front of path:
/Users/rodell/krige-smoke
/Users/rodell/krige-smoke/docs/source
******************************

through /Users/rodell/krige-smoke/docs/source/context.py -- pha

Open the reformated data with the linear, meter-based Lambert projection (EPSG:3347). Again this is helpful as lat/lon coordinates are not suitable for measuring distances which is vital for spatial interpolation.

[2]:
df = pd.read_csv(str(data_dir) + "/obs/gpm25.csv")
gpm25 = gpd.GeoDataFrame(
    df,
    crs="EPSG:4326",
    geometry=gpd.points_from_xy(df["lon"], df["lat"]),
).to_crs("EPSG:3347")
gpm25["Easting"], gpm25["Northing"] = gpm25.geometry.x, gpm25.geometry.y
gpm25.head()
[2]:
Unnamed: 0 id datetime lat lon PM2.5 geometry Easting Northing
0 2 42073 2021-07-16 22:00:00 47.185173 -122.176855 0.862 POINT (3972238.901 1767531.888) 3.972239e+06 1.767532e+06
1 3 53069 2021-07-16 22:00:00 47.190197 -122.177992 1.078 POINT (3972419.863 1768071.699) 3.972420e+06 1.768072e+06
2 12 10808 2021-07-16 22:00:00 40.507316 -111.899188 9.780 POINT (4460286.051 743178.640) 4.460286e+06 7.431786e+05
3 16 85391 2021-07-16 22:00:00 48.454213 -123.283643 0.874 POINT (3964698.001 1931774.531) 3.964698e+06 1.931775e+06
4 21 79095 2021-07-16 22:00:00 47.672130 -122.514183 0.784 POINT (3974631.739 1827689.201) 3.974632e+06 1.827689e+06

Create Grid#

Here, we will create a grid we want to use for the interpolation. NOTE we will use salem to create a dataset with the grid. This grid as a xarray dataset will be helpful for the universal kriging when we reproject other gridded data to act as covariances for interpolation.

[3]:
## define the desired grid resolution in meters
resolution = 20_000  # grid cell size in meters

## make grid based on dataset bounds and resolution
gridx = np.arange(gpm25.bounds.minx.min(), gpm25.bounds.maxx.max(), resolution)
gridy = np.arange(gpm25.bounds.miny.min(), gpm25.bounds.maxy.max(), resolution)

## use salem to create a dataset with the grid.
krig_ds = salem.Grid(
    nxny=(len(gridx), len(gridy)),
    dxdy=(resolution, resolution),
    x0y0=(gpm25.bounds.minx.min(), gpm25.bounds.miny.min()),
    proj="epsg:3347",
    pixel_ref="corner",
).to_dataset()
## print dataset
krig_ds
[3]:
<xarray.Dataset>
Dimensions:  (x: 145, y: 122)
Coordinates:
  * x        (x) float64 3.46e+06 3.48e+06 3.5e+06 ... 6.3e+06 6.32e+06 6.34e+06
  * y        (y) float64 4.984e+05 5.184e+05 5.384e+05 ... 2.898e+06 2.918e+06
Data variables:
    *empty*
Attributes:
    pyproj_srs:  +proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=4...

Setup OK#

[4]:
nlags = 15
variogram_model = "spherical"

startTime = datetime.now()
krig = OrdinaryKriging(
    x=gpm25["Easting"],
    y=gpm25["Northing"],
    z=gpm25["PM2.5"],
    variogram_model=variogram_model,
    # enable_statistics=True,
    nlags=nlags,
)
print(f"OK build time {datetime.now() - startTime}")
OK build time 0:00:00.305069

Variogram#

  • Graphical representation of spatial autocorrelation.

  • Shows a fundamental principle of geography: closer things are more alike than things farther apart

  • Its created by calculating the difference squared between the values of the paired locations

    • paired locations are binned by the distance apart

  • An empirical model is fitted to the binned (paired locations) to describe the likeness of data at a distance.

  • Type of empirical models

    • Circular

    • Spherical

    • Exponential

    • Gaussian

    • Linear

  • The fitted model is applied in the interpolation process by forming (kriging) weights for the predicted areas.

  • Three parameters that define a variogram..

    • sill: the total variance where the empirical model levels off,

      • is the sum of the nugget plus the sills of each nested structure.

    • (effective) range: The distance after which data are no longer correlated.

      • About the distance where the variogram levels off to the sill.

    • nugget: Related to the amount of short range variability in the data.

      • Choose a value for the best fit with the first few empirical variogram points.

      • A nugget that’s large relative to the sill is problematic and could indicate too much noise and not enough spatial correlation.

variogram statistics#

A good model should result in - Q1 close to zero

  • Q2 close to one

  • cR as small as possible. TODO define above stats variables.

PyKrige will optimze most parmters based on the defiend empirela mode and number of bins - I tested several empirical models and bin sizes and found (for this case study) spherical and 15 bins to be the optimal based on the output statics. {note} the literature supports spherical for geospatial interpolation applications over other methods.

[5]:
plotvariogram(krig)
_images/comps-ok_10_0.png

Execute OK#

Interpolate data to our grid using OK.

[6]:
startTime = datetime.now()
z, ss = krig.execute("grid", gridx, gridy)
print(f"OK execution time {datetime.now() - startTime}")
OK_pm25 = np.where(z < 0, 0, z)

# krig_ds["OK_pm25"] = (("y", "x"), OK_pm25)
OK execution time 0:00:04.007741

Plot OK Modelled PM2.5#

Convert data to polygons to be plot-able on a slippy mapbox. The conversion is not necessary, just fun to plot on a slippy map :)

[7]:
polygons, values = pixel2poly(gridx, gridy, OK_pm25, resolution)
pm25_model = gpd.GeoDataFrame(
    {"Modelled PM2.5": values}, geometry=polygons, crs="EPSG:3347"
).to_crs("EPSG:4326")

fig = px.choropleth_mapbox(
    pm25_model,
    geojson=pm25_model.geometry,
    locations=pm25_model.index,
    color="Modelled PM2.5",
    color_continuous_scale="jet",
    center={"lat": 50.0, "lon": -110.0},
    zoom=2.5,
    mapbox_style="carto-positron",
    opacity=0.8,
)
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10))
fig.update_traces(marker_line_width=0)
fig.show()

Onto Universal Kriging…#